In [1]:
library(Seurat)
library(ggplot2)
library(dplyr)
options(repr.plot.width=10, repr.plot.height=8)
library(harmony)
library(clusterProfiler)
library(enrichplot)
library(msigdbr)
library(stringr)
set.seed(2025)
Loading required package: SeuratObject

Loading required package: sp

‘SeuratObject’ was built under R 4.4.1 but the current version is
4.4.2; it is recomended that you reinstall ‘SeuratObject’ as the ABI
for R may have changed

‘SeuratObject’ was built with package ‘Matrix’ 1.6.5 but the current
version is 1.7.3; it is recomended that you reinstall ‘SeuratObject’ as
the ABI for ‘Matrix’ may have changed


Attaching package: ‘SeuratObject’


The following objects are masked from ‘package:base’:

    intersect, t



Attaching package: ‘dplyr’


The following objects are masked from ‘package:stats’:

    filter, lag


The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union


Loading required package: Rcpp



clusterProfiler v4.14.6 Learn more at https://yulab-smu.top/contribution-knowledge-mining/

Please cite:

Guangchuang Yu, Li-Gen Wang, Yanyan Han and Qing-Yu He.
clusterProfiler: an R package for comparing biological themes among
gene clusters. OMICS: A Journal of Integrative Biology. 2012,
16(5):284-287


Attaching package: ‘clusterProfiler’


The following object is masked from ‘package:stats’:

    filter


enrichplot v1.26.6 Learn more at https://yulab-smu.top/contribution-knowledge-mining/

Please cite:

Guangchuang Yu. Gene Ontology Semantic Similarity Analysis Using
GOSemSim. In: Kidder B. (eds) Stem Cell Transcriptional Networks.
Methods in Molecular Biology. 2020, 2117:207-215. Humana, New York, NY.

For full functionality, please install the 'msigdbdf' package with:
install.packages('msigdbdf', repos = 'https://igordot.r-universe.dev')

In [2]:
in_path <- "~/projects/2024/RA/GSE163121_RAW/"
out_path <- "~/projects/2024/RA/GSE163121_RAW/out/"
In [3]:
d_name <- c('GSM4972213', 'GSM4972214', 'GSM4972215', 'GSM4972216', 'GSM4972217')
In [4]:
scdata <- list()
for (i in 1:length(d_name)) {
    data <- Read10X(data.dir = paste0(in_path,d_name[i]))
    scdata[[i]] <- CreateSeuratObject(counts = data, project = d_name[i])
}
scdata
Warning message:
“Feature names cannot have underscores ('_'), replacing with dashes ('-')”
Warning message:
“Feature names cannot have underscores ('_'), replacing with dashes ('-')”
Warning message:
“Feature names cannot have underscores ('_'), replacing with dashes ('-')”
Warning message:
“Feature names cannot have underscores ('_'), replacing with dashes ('-')”
Warning message:
“Feature names cannot have underscores ('_'), replacing with dashes ('-')”
[[1]]
An object of class Seurat 
33694 features across 5514 samples within 1 assay 
Active assay: RNA (33694 features, 0 variable features)
 1 layer present: counts

[[2]]
An object of class Seurat 
33694 features across 4121 samples within 1 assay 
Active assay: RNA (33694 features, 0 variable features)
 1 layer present: counts

[[3]]
An object of class Seurat 
33694 features across 3516 samples within 1 assay 
Active assay: RNA (33694 features, 0 variable features)
 1 layer present: counts

[[4]]
An object of class Seurat 
33694 features across 7351 samples within 1 assay 
Active assay: RNA (33694 features, 0 variable features)
 1 layer present: counts

[[5]]
An object of class Seurat 
33694 features across 4535 samples within 1 assay 
Active assay: RNA (33694 features, 0 variable features)
 1 layer present: counts
In [5]:
GSE163121 <- merge(scdata[[1]], y = scdata[-1], add.cell.ids = d_name)
In [6]:
GSE163121
An object of class Seurat 
33694 features across 25037 samples within 1 assay 
Active assay: RNA (33694 features, 0 variable features)
 5 layers present: counts.GSM4972213, counts.GSM4972214, counts.GSM4972215, counts.GSM4972216, counts.GSM4972217
In [7]:
GSE163121[["percent.mt"]] <- PercentageFeatureSet(GSE163121, pattern = "^MT-")
In [8]:
GSE163121
An object of class Seurat 
33694 features across 25037 samples within 1 assay 
Active assay: RNA (33694 features, 0 variable features)
 5 layers present: counts.GSM4972213, counts.GSM4972214, counts.GSM4972215, counts.GSM4972216, counts.GSM4972217
In [9]:
VlnPlot(GSE163121, features = c("nFeature_RNA"), pt.size = 0)
VlnPlot(GSE163121, features = c("nCount_RNA"), pt.size = 0)
VlnPlot(GSE163121, features = c("percent.mt"), pt.size = 0)
Warning message:
“Default search for "data" layer in "RNA" assay yielded no results; utilizing "counts" layer instead.”
Warning message:
“Default search for "data" layer in "RNA" assay yielded no results; utilizing "counts" layer instead.”
No description has been provided for this image
Warning message:
“Default search for "data" layer in "RNA" assay yielded no results; utilizing "counts" layer instead.”
No description has been provided for this image
No description has been provided for this image
In [10]:
p1 <- FeatureScatter(GSE163121, feature1 = "nCount_RNA", feature2 = "percent.mt", raster = F)
p2 <- FeatureScatter(GSE163121, feature1 = "nCount_RNA", feature2 = "nFeature_RNA", raster = F)
p1
p2
No description has been provided for this image
No description has been provided for this image
In [11]:
GSE163121 <- subset(GSE163121, subset = nFeature_RNA > 200 & nFeature_RNA < 5000 & percent.mt <20)
GSE163121 <- JoinLayers(GSE163121)
GSE163121
An object of class Seurat 
33694 features across 24992 samples within 1 assay 
Active assay: RNA (33694 features, 0 variable features)
 1 layer present: counts
In [12]:
GSE163121 <- NormalizeData(GSE163121, normalization.method = "LogNormalize", scale.factor = 10000)
GSE163121 <- FindVariableFeatures(GSE163121, selection.method = "vst", nfatures = 2000)
GSE163121 <- ScaleData(GSE163121, features = rownames(GSE163121))
GSE163121 <- RunPCA(GSE163121)
Normalizing layer: counts

Finding variable features for layer counts

Centering and scaling data matrix

PC_ 1 
Positive:  TYROBP, LYZ, FCER1G, S100A8, S100A9, CSTA, FCN1, S100A12, THBS1, C15orf48 
	   G0S2, CSF3R, MAFB, CLEC7A, IL1B, TREM1, S100A11, SLC11A1, CEBPD, CYP1B1 
	   IER3, SERPINA1, SEMA6B, C5AR1, PTGES, CTSD, CDA, TNFAIP2, SERPINB2, CFD 
Negative:  EEF1A1, CD74, RPS18, RPLP0, RPLP1, IGHM, MT-CYB, FCER2, TCL1A, ISG20 
	   NEIL1, JUN, IGKC, MYC, HIST1H4C, MALAT1, KIAA0125, IL6, IGLC3, STAG3 
	   INSIG2, IGKV3-20, ZNF595, POU2AF1, IGHV5-78, IGHV3-30, IRF4, RP5-887A10.1, IGKV1-12, IGKV1-5 
PC_ 2 
Positive:  FTH1, THBS1, S100A12, LYZ, CSTA, C15orf48, EEF1A1, S100A8, PTGES, TREM1 
	   S100A9, SEMA6B, CD74, CYP1B1, G0S2, CSF3R, IL1B, FCN1, SLC11A1, SERPINB2 
	   MAFB, CDA, SAT1, CLEC7A, IFNGR2, RPLP1, FTL, ANPEP, DMXL2, FCAR 
Negative:  GZMA, GZMB, CTSW, PRF1, GZMM, CD7, CLIC3, NKG7, CD247, FGFBP2 
	   IL2RB, SPON2, KLRF1, IGFBP7, ADGRG1, KLRB1, S1PR5, ZAP70, GZMH, IL32 
	   SH2D1B, IFITM1, TTC38, C1orf21, SPN, PRKCH, SAMD3, TRBC1, CCL5, MATK 
PC_ 3 
Positive:  MALAT1, FTH1, CD7, GZMA, GZMB, CD247, NKG7, IL32, GZMM, PRF1 
	   EEF1A1, CTSW, KLRB1, FGFBP2, IL2RB, SPON2, CLIC3, PRKCH, GZMH, ZAP70 
	   KLRF1, ADGRG1, S1PR5, CD3E, CCL5, NEAT1, MATK, FCER1G, TRBC1, IGFBP7 
Negative:  TXNDC5, TNFRSF17, MZB1, DERL3, AQP3, HRASLS2, CHPF, POU2AF1, ITM2C, CD27 
	   IGLL5, CPNE5, DCPS, TMSB10, MYDGF, IGKC, PEBP1, PPIB, ISOC2, RP11-1070N10.3 
	   SSR3, LSP1, SDF2L1, EAF2, FKBP11, PYCR1, CD74, GPRC5D, PDIA5, MT-CYB 
PC_ 4 
Positive:  CD3D, CD3E, CAMK4, CD6, CD2, MAL, NELL2, CD3G, AQP3, LEPROTL1 
	   RPLP0, TRAT1, RGS10, TNFRSF25, RORA, EEF1A1, BCL11B, RPLP1, IL32, LCK 
	   DHRS3, CD28, NINJ1, GATA3, TRAC, FKBP11, CD27, LIME1, RPS26, CD96 
Negative:  CD74, MT-CO1, MT-CYB, IGHM, FGR, IFITM3, FCER2, TMSB10, IFI44L, LY6E 
	   RHOB, RGS2, IFITM2, CTSS, LSP1, TCL1A, ITGB2, SPON2, IGFBP7, C1orf162 
	   XAF1, GPR18, GZMB, PRF1, GZMA, IFI6, CLIC3, KLRF1, PLD4, CH17-373J23.1 
PC_ 5 
Positive:  HES4, NEAT1, MALAT1, LGALS1, SAT1, ZBTB32, RPS26, GPR137B, DUSP4, LMNA 
	   RHOB, SRGN, TNFRSF1B, BLVRB, LGMN, TUBB6, ENC1, SAMSN1, ATF3, AC104024.1 
	   TMSB4X, CLECL1, CD86, RGCC, FTH1, SEC61B, KPNA2, SEMA7A, BCL2A1, RGS2 
Negative:  MT-CYB, TCL1A, FCER2, TMSB10, IGHM, C1orf162, CD3E, CD3D, JUN, LCK 
	   MYC, PIM1, STAG3, IGLL5, MAL, CD6, CDC25B, CD3G, CYBB, NOSIP 
	   IGHV5-78, RP11-164H13.1, EEF1A1, IL32, MID1IP1, FCGRT, PDE4D, CAMK4, APLP2, CD2 

In [13]:
ElbowPlot(GSE163121)
No description has been provided for this image
In [14]:
DimPlot(GSE163121)
No description has been provided for this image
In [15]:
GSE163121 <- RunUMAP(GSE163121, dims = 1:9, reduction = "pca", reduction.name = "umap.unintegrated")
Warning message:
“The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
This message will be shown once per session”
12:15:35 UMAP embedding parameters a = 0.9922 b = 1.112

12:15:35 Read 24992 rows and found 9 numeric columns

12:15:35 Using Annoy for neighbor search, n_neighbors = 30

12:15:35 Building Annoy index with metric = cosine, n_trees = 50

0%   10   20   30   40   50   60   70   80   90   100%

[----|----|----|----|----|----|----|----|----|----|

*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
|

12:15:38 Writing NN index file to temp file /tmp/Rtmpv61tWw/file32c7911b450abd

12:15:38 Searching Annoy index using 1 thread, search_k = 3000

12:15:48 Annoy recall = 100%

12:15:49 Commencing smooth kNN distance calibration using 1 thread
 with target n_neighbors = 30

12:15:51 Initializing from normalized Laplacian + noise (using RSpectra)

12:15:52 Commencing optimization for 200 epochs, with 1022006 positive edges

12:15:52 Using rng type: pcg

12:16:07 Optimization finished

In [16]:
DimPlot(GSE163121)
No description has been provided for this image
In [17]:
Idents(GSE163121, cells = colnames(GSE163121)[is.element(GSE163121$orig.ident,c("GSM4972213","GSM4972214"))]) <- "HC"
Idents(GSE163121, cells = colnames(GSE163121)[is.element(GSE163121$orig.ident,c("GSM4972215","GSM4972216", "GSM4972217"))]) <- "SLE"
GSE163121$conditions <- Idents(GSE163121)
DimPlot(GSE163121, raster = F, group.by = "conditions")
No description has been provided for this image
In [18]:
GSE163121 <- RunHarmony(GSE163121,"orig.ident", plot_convergence = T, theta = 3)
Transposing data matrix

Initializing state using k-means centroids initialization

Harmony 1/10

Harmony 2/10

Harmony 3/10

Harmony converged after 3 iterations

No description has been provided for this image
In [19]:
harmony_embeddings <- Embeddings(GSE163121, "harmony")
harmony_embeddings[1:5, 1:5]
A matrix: 5 × 5 of type dbl
harmony_1harmony_2harmony_3harmony_4harmony_5
GSM4972213_AAACCTGCAATGTTGC-1-2.56598081.4727543 0.7082829-0.4493815-1.9514298
GSM4972213_AAACCTGGTCCGCTGA-1-2.68299340.9718616-0.2444526 0.2220524-0.8091164
GSM4972213_AAACCTGGTCTAGGTT-1-0.59736930.9813738-1.2604122-0.1999167 0.3847095
GSM4972213_AAACCTGGTCTGATCA-1-1.01944181.2523079 0.3056075-0.8770639-0.5351299
GSM4972213_AAACCTGTCAACACAC-1-2.04823781.4868648 0.8508176-0.4472641-1.1899475
In [20]:
DimPlot(GSE163121, reduction = "harmony", group.by = "orig.ident",raster = F)
No description has been provided for this image
In [21]:
GSE163121 <- RunUMAP(GSE163121, dims = 1:9, reduction = "harmony")
12:16:38 UMAP embedding parameters a = 0.9922 b = 1.112

12:16:38 Read 24992 rows and found 9 numeric columns

12:16:38 Using Annoy for neighbor search, n_neighbors = 30

12:16:38 Building Annoy index with metric = cosine, n_trees = 50

0%   10   20   30   40   50   60   70   80   90   100%

[----|----|----|----|----|----|----|----|----|----|

*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
|

12:16:42 Writing NN index file to temp file /tmp/Rtmpv61tWw/file32c79188f6fee

12:16:42 Searching Annoy index using 1 thread, search_k = 3000

12:16:52 Annoy recall = 100%

12:16:53 Commencing smooth kNN distance calibration using 1 thread
 with target n_neighbors = 30

12:16:55 Initializing from normalized Laplacian + noise (using RSpectra)

12:16:56 Commencing optimization for 200 epochs, with 1014148 positive edges

12:16:56 Using rng type: pcg

12:17:14 Optimization finished

In [22]:
GSE163121 <- FindNeighbors(GSE163121, reduction = "harmony", dims = 1:9)
GSE163121 <- FindClusters(GSE163121, resolution = 0.5, cluster.name = "harmony_clusters")
Computing nearest neighbor graph

Computing SNN

Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 24992
Number of edges: 708266

Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.8399
Number of communities: 10
Elapsed time: 6 seconds
In [23]:
p1<- DimPlot(GSE163121, raster = F, reduction = "umap")
p1
p2<- DimPlot(GSE163121, raster = F, reduction = "umap", group.by = "orig.ident")
p2
p3 <- DimPlot(GSE163121, raster = F, reduction = "umap", group.by = "conditions")
p3
pdf(paste0(out_path, "UMAP_harmony.pdf"), width=8, height=6)
p1
p2
p3
dev.off()
No description has been provided for this image
No description has been provided for this image
pdf: 2
No description has been provided for this image
In [24]:
ABC_marker_1 <- c("CD19", "CD80", "CD86", "FAS", "FCRLA", "FCRLB", "FGL2", "CXCL10", "CXCR3", "NKG7", "ITGAM", "ITGB2", "TBX21", "ZBTB32", "FCER2", "CR2")
length(ABC_marker_1)
is.element(ABC_marker_1,rownames(GSE163121))
ABC_marker_1 <- ABC_marker_1[is.element(ABC_marker_1,rownames(GSE163121))]
16
  1. TRUE
  2. TRUE
  3. TRUE
  4. TRUE
  5. TRUE
  6. TRUE
  7. TRUE
  8. TRUE
  9. TRUE
  10. TRUE
  11. TRUE
  12. TRUE
  13. TRUE
  14. TRUE
  15. TRUE
  16. TRUE
In [25]:
fig1_1 <- FeaturePlot(GSE163121, features = ABC_marker_1[1:9], ncol = 3, reduction = "umap")
fig1_2 <- FeaturePlot(GSE163121, features = ABC_marker_1[10:16], ncol = 3, reduction = "umap")
pdf(paste0(out_path, "ABC_marker_feature.pdf"), width=12, height=10)
fig1_1
fig1_2
dev.off()
pdf: 2
In [26]:
source("~/projects/MK_Rcode/setZ.R")
GSE163121[["ABC_score"]] <- setZ(ABC_marker_1,rownames(GSE163121@assays$RNA$scale.data),GSE163121@assays$RNA$scale.data)
In [27]:
LSP1_gene <- c("CEBPA", "CEBPB", "IL1B", "CCL3", "TNF", "CCL4", "CCL5", "SPP1", "CD7", "MSR1", "ITGAM", "KIT", "CD14", "CSF3R", "IL1R2", "CSF1R", "ITGA1", "ITGA5", "ITGB5", "ITGB1", "CAMP", "THBS1", "C3", "MEFV", "FOS", "CYBB", "CARD9", "IRF7", "OAS3", "NLRP3", "CTSB", "IKBKE", "MPO")
length(LSP1_gene)
LSP1_gene_sub <- LSP1_gene[is.element(LSP1_gene,rownames(GSE163121))]
33
In [28]:
LSP1_gene_sub
  1. 'CEBPA'
  2. 'CEBPB'
  3. 'IL1B'
  4. 'CCL3'
  5. 'TNF'
  6. 'CCL4'
  7. 'CCL5'
  8. 'SPP1'
  9. 'CD7'
  10. 'MSR1'
  11. 'ITGAM'
  12. 'KIT'
  13. 'CD14'
  14. 'CSF3R'
  15. 'IL1R2'
  16. 'CSF1R'
  17. 'ITGA1'
  18. 'ITGA5'
  19. 'ITGB5'
  20. 'ITGB1'
  21. 'CAMP'
  22. 'THBS1'
  23. 'C3'
  24. 'MEFV'
  25. 'FOS'
  26. 'CYBB'
  27. 'CARD9'
  28. 'IRF7'
  29. 'OAS3'
  30. 'NLRP3'
  31. 'CTSB'
  32. 'IKBKE'
  33. 'MPO'
In [29]:
fig2_1 <- FeaturePlot(GSE163121, features = LSP1_gene_sub[1:9], ncol = 3, reduction = "umap")
fig2_2 <- FeaturePlot(GSE163121, features = LSP1_gene_sub[10:18], ncol = 3, reduction = "umap")
fig2_3 <- FeaturePlot(GSE163121, features = LSP1_gene_sub[19:27], ncol = 3, reduction = "umap")
fig2_4 <- FeaturePlot(GSE163121, features = LSP1_gene_sub[28:33], ncol = 3, reduction = "umap")

pdf(paste0(out_path, "LSP1_marker_feature.pdf"), width=12, height=10)
fig2_1
fig2_2
fig2_3
fig2_4
dev.off()
pdf: 2
In [30]:
GSE163121[["LSP1_score"]] <- setZ(LSP1_gene_sub,rownames(GSE163121@assays$RNA$scale.data),GSE163121@assays$RNA$scale.data)
In [43]:
FeaturePlot(GSE163121, features = "LSP1_score", reduction = "umap") + 
  scale_color_gradient2(low = "white", mid = "white", high = "blue", midpoint = 0)
FeaturePlot(GSE163121, features = "LSP1_score", reduction = "umap")
GSE163121$LSP1_score_pos <- GSE163121$LSP1_score > 0
DimPlot(GSE163121, group.by = "LSP1_score_pos", reduction = "umap")
FeaturePlot(GSE163121, features = "ABC_score") + 
  scale_color_gradient2(low = "white", mid = "white", high = "blue", midpoint = 0)
FeaturePlot(GSE163121, features = "ABC_score", reduction = "umap")
GSE163121$ABC_score_pos <- GSE163121$ABC_score > 0
DimPlot(GSE163121, group.by = "ABC_score_pos", reduction = "umap")

pdf(paste0(out_path, "score_feature.pdf"), width=7, height=4.5)
FeaturePlot(GSE163121, features = "LSP1_score", reduction = "umap") + 
  scale_color_gradient2(low = "white", mid = "white", high = "blue", midpoint = 0)
FeaturePlot(GSE163121, features = "LSP1_score", reduction = "umap")
DimPlot(GSE163121, group.by = "LSP1_score_pos", reduction = "umap")
FeaturePlot(GSE163121, features = "ABC_score", reduction = "umap") + 
  scale_color_gradient2(low = "white", mid = "white", high = "blue", midpoint = 0)
FeaturePlot(GSE163121, features = "ABC_score", reduction = "umap")
DimPlot(GSE163121, group.by = "ABC_score_pos", reduction = "umap")
dev.off()
Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.
No description has been provided for this image
No description has been provided for this image
Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.
Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.
pdf: 2
No description has been provided for this image
In [32]:
B_marker <- c("CD19", "CD79A", "CD79B", "MS4A1", "CD27", "CD24", "CD38", "IGHD")
T_marker <- c("CD4", "CD8A", "CD69", "CD70", "CD80", "CD40LG", "PDCD1", "HAVCR2")
FeaturePlot(GSE163121, features = B_marker, ncol = 3, reduction = "umap")
FeaturePlot(GSE163121, features = T_marker, ncol = 3, reduction = "umap")

pdf(paste0(out_path, "marker_expression.pdf"), width=12, height=12)
FeaturePlot(GSE163121, features = B_marker, ncol = 3, reduction = "umap")
FeaturePlot(GSE163121, features = T_marker, ncol = 3, reduction = "umap")
dev.off()
No description has been provided for this image
pdf: 2
No description has been provided for this image
In [33]:
LSP1_pos_ind <- GSE163121$LSP1_score_pos==TRUE & GSE163121$ABC_score_pos==FALSE
ABC_pos_ind <- GSE163121$LSP1_score_pos==FALSE & GSE163121$ABC_score_pos==TRUE
DP_ind <- GSE163121$LSP1_score_pos==TRUE & GSE163121$ABC_score_pos==TRUE
DN_ind <- GSE163121$LSP1_score_pos==FALSE & GSE163121$ABC_score_pos==FALSE
sum(LSP1_pos_ind)
sum(ABC_pos_ind)
sum(DP_ind)
sum(DN_ind)
3542
6386
3511
11553
In [34]:
Idents(GSE163121, cells = colnames(GSE163121)[LSP1_pos_ind]) <- "LSP1_positive"
Idents(GSE163121, cells = colnames(GSE163121)[ABC_pos_ind]) <- "ABC_positive"
Idents(GSE163121, cells = colnames(GSE163121)[DP_ind]) <- "Double_positive"
Idents(GSE163121, cells = colnames(GSE163121)[DN_ind]) <- "Double_negative"
DimPlot(GSE163121, reduction = "umap")
pdf(paste0(out_path,"umap_ABC_LSP1.pdf"), width = 8, height = 6)
DimPlot(GSE163121, reduction = "umap")
dev.off()
pdf: 2
No description has been provided for this image
In [35]:
markers <- FindAllMarkers(GSE163121, min.pct = 0.1, test.use = 'LR', logfc.threshold = 0.25)
GSE163121
Calculating cluster Double_negative

Calculating cluster Double_positive

Calculating cluster ABC_positive

Calculating cluster LSP1_positive

An object of class Seurat 
33694 features across 24992 samples within 1 assay 
Active assay: RNA (33694 features, 2000 variable features)
 3 layers present: counts, data, scale.data
 4 dimensional reductions calculated: pca, umap.unintegrated, harmony, umap
In [36]:
markers %>%
  group_by(cluster) %>%
  filter(avg_log2FC > 0) %>%
  filter(p_val_adj < 0.05) %>%
  arrange(desc(avg_log2FC)) %>%
  slice_head(n = 5) %>%
  ungroup() -> top5
# DoHeatmap(pbmc, features = top5$gene) + NoLegend()
DotPlot(GSE163121, features = top5$gene[!duplicated(top5$gene)] ) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) +
  scale_color_gradient2(low = "blue", mid = "white", high = "red", midpoint = 0)
pdf(paste0(out_path, "Dotplot.pdf"), width = 8, height = 6)
DotPlot(GSE163121, features = top5$gene[!duplicated(top5$gene)] ) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) +
  scale_color_gradient2(low = "blue", mid = "white", high = "red", midpoint = 0)
dev.off()
Warning message:
“Scaling data with a low number of groups may produce misleading results”
Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.
Warning message:
“Scaling data with a low number of groups may produce misleading results”
Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.
pdf: 2
No description has been provided for this image
In [37]:
unique(Idents(GSE163121))
  1. Double_negative
  2. LSP1_positive
  3. ABC_positive
  4. Double_positive
Levels:
  1. 'Double_negative'
  2. 'Double_positive'
  3. 'ABC_positive'
  4. 'LSP1_positive'
In [38]:
LSP1vsABC.marker <- FindMarkers(GSE163121, min.pct = 0.1, test.use = 'LR', logfc.threshold = 0.1, ident.1 = "LSP1_positive", ident.2 = "ABC_positive")
source("~/projects/MK_Rcode/volcano_scdata.R")
In [39]:
vol1 <- volcano_scdata(LSP1vsABC.marker, "LSP1 vs ABC")
pdf(paste0(out_path,"LSP1_vs_ABC_volcano.pdf"), width = 6, height = 6)
vol1
dev.off()
pdf: 2
No description has been provided for this image
In [40]:
hallmark_geneset <- msigdbr(species = "Homo sapiens",  category = "H")
hallmark_geneset$gs_name_short <- str_extract(hallmark_geneset$gs_name, "(?<=HALLMARK_).*$")
sort_glist <- LSP1vsABC.marker$avg_log2FC
names(sort_glist) <- rownames(LSP1vsABC.marker)
sort_glist <- sort(sort_glist, decreasing = T)
gsea_result.LSP1vsABS <- GSEA(geneList = sort_glist, TERM2GENE = select(hallmark_geneset, gs_name_short, gene_symbol), seed = T, pvalueCutoff = 0.99)
if (!file.exists(paste0(out_path,"gsea_GSE163121_050725.rds"))) {
    saveRDS(gsea_result.LSP1vsABS, paste0(out_path,"gsea_GSE163121_050725.rds"))
}
gsea_result.LSP1vsABS <- GSEA(geneList = sort_glist, TERM2GENE = select(hallmark_geneset, gs_name_short, gene_symbol), seed = T)
dotplot(gsea_result.LSP1vsABS, showCategory=10, split=".sign") + facet_grid(.~.sign)
Warning message:
“The `category` argument of `msigdbr()` is deprecated as of msigdbr 10.0.0.
ℹ Please use the `collection` argument instead.”
The 'msigdbdf' package must be installed to access the full dataset.

Please run the following command to install the 'msigdbdf' package:
install.packages('msigdbdf', repos = 'https://igordot.r-universe.dev')

using 'fgsea' for GSEA analysis, please cite Korotkevich et al (2019).


preparing geneSet collections...

GSEA analysis...

Warning message in fgseaMultilevel(pathways = pathways, stats = stats, minSize = minSize, :
“For some pathways, in reality P-values are less than 1e-10. You can set the `eps` argument to zero for better estimation.”
leading edge analysis...

done...

using 'fgsea' for GSEA analysis, please cite Korotkevich et al (2019).


preparing geneSet collections...

GSEA analysis...

Warning message in fgseaMultilevel(pathways = pathways, stats = stats, minSize = minSize, :
“For some pathways, in reality P-values are less than 1e-10. You can set the `eps` argument to zero for better estimation.”
leading edge analysis...

done...

No description has been provided for this image
In [41]:
pdf(paste0(out_path, "GSEA_LSP1vsABS.pdf"), width = 8, height = 6)
dotplot(gsea_result.LSP1vsABS, showCategory=10, split=".sign") + facet_grid(.~.sign)
dev.off()
pdf: 2
In [42]:
if (!file.exists(paste0(out_path,"GSE163121_metadata.rds"))) {
    saveRDS(GSE163121@meta.data, file = paste0(out_path,"GSE163121_metadata.rds"))
}
In [44]:
if (!file.exists(paste0(out_path,"GSE163121_Sobj_050825.rds"))) {
    saveRDS(GSE163121, paste0(out_path,"GSE163121_Sobj_050825.rds"))
}